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Abstract 



We propose to use novel and classical audio and text signal-processing and otherwise techniques 
for "inexpensive" fast writer identification tasks of scanned hand-written documents "visually". The 
"inexpensive" refers to the efficiency of the identification process in terms of CPU cycles while 
preserving decent accuracy for preliminary identification. This is a comparative study of multiple 
algorithm combinations in a pattern recognition pipeline implemented in Java around an open-source 
Modular Audio Recognition Framework (MARF) that can do a lot more beyond audio. We present 
our preliminary experimental findings in such an identification task. We simulate "visual" identifica- 
tion by "looking" at the hand-written document as a whole rather than trying to extract fine-grained 
features out of it prior classification. 

Keywords: writer identification, Modular Audio Recognition Framework (MARF), signal process- 
ing, simulation 

1 Introduction 

1.1 Problem Statement 

Current techniques for writer identification often rely on the classical tools, methodologies, and algo- 
rithms in handwriting recognition (and in general in any image-based pattern recognition) such as skele- 
tonizing, contouring, line-based and angle-based feature extraction, and many others. Then those tech- 
niques compare the features to the "style" of features a given writer may have in the trained database 
of known writers. These classical techniques are, while highly accurate, are also time consuming for 
bulk processing of a large volume of digital data of handwritten material for its preliminary or secondary 
identification of who may have written what. 

1.2 Proposed Solution 

We simulate "quick visual identification" of the hand-writing of the writer by looking at a page of hand- 
written text as a whole to speed up the process of identification, especially when one needs to do a quick 
preliminary classification of a large volume of documents. For that we treat the sample pages as either 
ID or 2D arrays of data and apply ID or 2D loading using various loading methods, then ID or 2D 
filtering, then in the case of 2D filtering, we flatten a 2D array into ID prior feature extraction, and 
then we continue the classical feature extraction, training and classification tasks using a comprehensive 
algorithm set within Modular Audio Recognition Framework (MARF)'s implementation, by roughly 
treating each hand-written image sample as a wave form as in e.g. in speaker identification. We insist on 
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Figure 1: MARF's Pattern Recognition Pipeline 



ID as it is the baseline storage mechanism for MARF and it is less storage consuming while sufficient to 
achieve high accuracy in the writer identification task. 

This approach is in a way similar to the one where MARF was applied to file type analysis for 
forensic purposes [20] using machine learning and assuming each file is a sort of a signal on Unix 
systems as compared to the traditional file utility 0IU. 



1.3 Introduction to MARF 



Modular Audio Recognition Framework (MARF) is an open-source collection of pattern recognition 
APIs and their implementation for unsupervised and supervised machine learning and classification writ- 
ten in Java |[T3ll24l[T7l[T5l[T6l[T8 1. One of its design purposes is to act as a testbed to try out common and 
novel algorithms found in literature and industry for sample loading, preprocessing, feature extraction, 
and training and classification tasks. One of the main goals and design approaches of MARF is to provide 
scientists with a tool for comparison of the algorithms in a homogeneous environment and allowing the 
dynamic module selection (from the implemented modules) based on the configuration options supplied 
by applications. Over the course of several years MARF accumulated a fair number of implementations 
for each of the pipeline stages allowing reasonably comprehensive comparative studies of the algorithms 
combinations, and studying their combined behavior and other properties when used for various pat- 
tern recognition tasks. MARF is also designed to be very configurable while keeping the generality 
and some sane default settings to "run-off-the-shelf" well. MARF and its derivatives, and applications 
were also used beyond audio processing tasks due to the generality of the design and implementation 
in 031 ED [HI |22] |23l and other unpublished or in-progress works. 
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1.3.1 Classical Pattern Recognition Pipeline 

The conceptual pattern recognition pipeline shown in Figure [T]depicts the core of the data flow and trans- 
formation between the stages of the MARF's pipeline. The inner boxes represent most of the available 
concrete module implementations or stubs. The grayed-out boxes are either the stubs or partly imple- 
mented. The white boxes signify implemented algorithms. Generally, the whole pattern recognition 
process starts by loading a sample (e.g. an audio recording in a wave form, a text, or image file), pre- 
processing it (removing noisy and "silent" data and other unwanted elements), then extracting the most 
prominent features, and finally either training of the system such that the system either learns a new set 
of a features of a given subject or actually classify and identify what/how the subject is. The outcome 
of training is either a collection of some form of feature vectors or their mean or median clusters, which 
a stored per subject learned. The outcome of classification is a 32-bit unique integer usually indicating 
who/what the subject the system believes is. MARF designed to be a standalone marf .jar file required 
to be usable and has no dependencies on other libraries. Optionally, there is a dependency for debug 
versions of marf . j ar when JUnit Q is used for unit testing. 

1.3.2 Algorithms 

MARF has actual implementations of the framework's API in a number of algorithms to demonstrate 
its abilities in various pipeline stages and modules. There are a number of modules that are under the 
process of implementation or porting from other project for comparative studies that did not make it to 
this work at the time of its writing. Thus, the below is an incomplete summary of implemented algorithms 
corresponding to the Figure [T] with a very brief description: 

• Fast Fourier transform (FFT), used in FFT-based filtering as well as feature extraction d. 

• Linear predictive coding (LPC) used in feature extraction. 

• Artificial neural network (classification). 

• Various distance classifiers (Chebyshev, Euclidean, Minkowski |T], Mahalanobis [12], Diff (in- 
ternally developed within the project, roughly similar in behavior to the UNIX/Linux diff util- 
ity CD), and Hamming Q). 

• Cosine similarity measure (UdOl, which was thoroughly discussed in [9] and often produces the 
best accuracy in this work in many configurations (see further). 

• Zipf 's Law-based classifier l25l . 

• General probability classifier. 

• Continuous Fraction Expansion (CFE)-based filters (H. 

• A number of math-related tools, for matrix and vector processing, including complex numbers 
matrix and vector operations, and statistical estimators used in smoothing of sparse matrices (e.g. 
in probabilistic matrices or Mahalanobis distance's covariance matrix). All these are needed for 
MARF to be self-contained. 
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2 Methodology 

To enable the experiments in this work and their results we required to do the alteration of the MARF's 
pipeline through its plug-in architecture. We outline the modifications and the experiments and conducted 
using a variety of options. 

2.1 Modified MARF's Pipeline 

There are slight modifications to the pipeline that were required to MARF's original pipeline in order 
to enable some of the experiments outlined below for the writer identification tasks. Luckily, due to 
MARF's extensible architecture we can do those modifications as plug-ins, primarily for sample loading 
and preprocessing, that we plan on integrating into the core of MARF. 

2.1.1 Loaders 

We experiment with a diverse scanned image sample loading mechanisms to see which contribute more 
to the better accuracy results and the most efficient. There is a naive and less naive approach to do so. 
We can treat the incoming sample as: 

• an image, essentially a 2D array, naturally 

• a byte stream, i.e. just a ID array of raw bytes 

• a text file, treat the incoming bytes as text, also ID 

• a wave form, as if it is encoded WAVE file, also ID 

Internally, regardless the initial interpretation of the scanned hand-written image samples, the data is 
always treated as some wave form or another. The initial loading affects the outcome significantly, and 
we tried to experiment which one yields better results, which we present as options. For this to work 
we had to design and implement ImageSampleLoader as an external to MARF plug-in to properly 
decode the image data as image and return a 2D array representation of it (it is later converted to ID 
for further processing). We adapt the ImageSample and ImageSampleLoader previously designed for 
the TestFilters application of MARF for 2D filter tests. The other loaders were already available 
in MARF's implementation, but had to be subclassed or wrapped around to override some settings. 
Specifically, we have a ByteArrayFileReader, TextLoader, and WAVLoader in the core MARF that 
we rely upon. For the former, since it does not directly implement the ISampleLoader interface, we also 
create an external wrapper plug-in, RawSampleLoader. The TextLoader provides options for loading 
the data as uni-, bi-, and tri-gram models, i.e. one sample point consists of one, two, or three characters. 
The WAVLoader allows treating the incoming sample at different sample rates as well, e.g. 8000 kHz, 
16000 kHz, and so on. We had to create a TIFFtoWAVLoader plug-in for this research work to allow the 
treatment of the TIFF files as WAV with the proper format settings. 

2.1.2 Filters 

The Filter Framework of MARF and its API represented by the IFilter interface has to be invoked 
with the 2D versions of the filters instead of ID, which is a sufficient default for audio signal processing. 
The Filter framework has a 2D API processing that can be applied to images, "line-by-line". The 2D 
API of IFilter returns a 2D results. In order for it to be usable by the rest of the pipeline, it has 
to be "flattened" into a ID array. The "flattening" can be done row-by-row or column-by-column; we 
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experiment with both ways of doing it. Once flattened, the rest of the pipeline process functions as 
normal. Since there is no such a default preprocessing module in the core MARF, we implement it as a 
preprocessing plug-in in this work, which we call Filter2DtolD. This class implements preprocess () 
method to behave the described way. This class in itself actually does not do much, but instead the FFT- 
based set of filters is mirrored from the core MARF to this plug-in to adhere to this new implementation 
of preprocess () and at the same time to delegate all the work to the core modules. Thus, we have the 
base FFTFilter2DtolD, and the concrete LowPass2DtolD, HighPass2DtolD, BandStop2DtolD, and 
BandPass2DtolD FFT-based filters. The CFE-based filters require further testing at this point and as 
such were not included in the experiments. 

2.1.3 Noise Removal 

We employ two basic methodologies of noise removal in our experiments: (1) we either remove the noise 
by loading the "noise" sample, a scanned "blank" sheet with no writings on it. Subtracting the frequen- 
cies of this noise sample from the incoming samples gives us the net effect of large noise removal. This 
FFT sample-based noise remover is only effective for the 2D preprocessing operations. Implementation- 
wise, we implement it in the SampleBasedNoiseRemover preprocessing plug-in class. (2) We compare 
that to the default noise removal in MARF that is constructed by application of the plain ID low-pass 
FFT filter. 

2.2 WriterldentApp 

We provide a testing application, called WriterldentApp, to do all the experiments in this work and 
statistics gathering. The application is a writer-identification-oriented fork of SpeakerldentApp present 
within MARF's repository (24] for speaker identification task. The application has been amended with 
the options to accept the four loader types instead of two, noise removal by subtraction of the noise 
sample, and the 2D filtering plug-ins. The rest of the application options are roughly the same as that 
of SpeakerldentApp. Like all of MARF and its applications, WriterldentApp will be released as 
open-source and can be made available to the willing upon request prior that. 

2.3 Resolution 

We vary the resolution of our samples as 600dpi, 300dpi, and 96dpi in our experiments to see how it 
affects the accuracy of the identification. The samples are both grayscale and black-and-white. 

3 Testing, Experiments, and Results 

The handwritten training samples included two pages scanned from students' quizzes. The testing per- 
formed on the another, third page of the same exam for each student. The total number of students's 
exams in class studied is 25. 

3.1 Setup 

In the setup we are testing multiple permutation of configurable parameters, which are outlined below. 
3.1.1 Samples 

The samples are scanned pages letter-sized as uncompressed TIFF images of the following resolutions 
and color schemes: 
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• 600 dpi grayscale, black-and-white 

• 300 dpi grayscale, black-and-white 

• 96 dpi grayscale, black-and-white 

3.1.2 Sample Loaders 

• Text loader: unigram, bigram, trigram 

• WAVE loader: PCM, 8000 kHz, mono, 2 bytes per amplitude sample point 

• Raw loader: byte loader (1-byte, 2-byte, 4-byte per sample point) 

• TIFF Image 2D loader 

Byte loader and text loader are similar but not identical. In Java characters are in UNICODE and 
occupy physically two bytes and we use a character-oriented reader to do so. In the byte loader, we deal 
with the raw bytes and our "ngrams" correspond to the powers of 2. 

3.1.3 Preprocessing 

• ID filtering works with ID loaders, and low-pass FFT filter acts as a noise remover 

• 2D filtering covered by a plug-in with 2D FFT filters and noise sample subtraction 

• Flattening of the 2D data to ID by row or column 

3.1.4 Feature Extraction and Classification 

The principle fastest players in the experimentation so-far were primarily the distance and similarity 
measure classifiers and for feature extraction FFT, LPC and min/max amplitudes. All these modules are 
at their defaults as defined by MARF B4llBllT7lHglHSI. 

3.2 Analysis 

Generally, it appears the 2D versions of the combinations produce higher accuracy. The text-based and 
byte-based loaders perform at the average level and the wave form loading slightly better. The black- 
and-white images at all resolutions obviously load faster as they are much smaller in size, and even 
the 96 dpi-based image performed very well suggesting the samples need not be of the highest quality. 
Algorithm combinations that had silence removed after either ID or 2D based noise removal contributed 
to the best results by eliminating "silence gaps" (in the image strings of zeros, similar to compression) 
thereby making samples looking quite distinct. The noise-sample based removal, even eliminates the 
printed text and lines of the handed-out exam sheets keeping only hand-written text and combined with 
the silence removal pushes the accuracy percentage even higher. 

The experiments are running on two major hardware pieces: a Dell desktop and a server with two 2 
CPUs. Run-time that it takes to train the system on 50 600dpi grayscale samples (35Mb each) is varied 
between 15 to 20 minutes on a Dell Precision 370 workstation with 1GB of RAM and 1.5GHz Intel 
Pentium 4 processor running Fedora Core 4 Linux. For the testing samples, it takes between 4 and 7 
seconds depending on the algorithm combination for the 35Mb testing samples. All the sample files 
were read off a DVD disk, so the performance was less optimal than from a hard disk. In the case of the 
server with two Intel Pentium 4 CPUs, 4GB of ram and the four processing running it takes 2-3 times 
faster for the same amount. 96dpi b/w images take very fast to process and offer the best response times. 
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4 Conclusion 

As of this writing due two numerous exhaustive combinations and about 600 runs per loader most of the 
experiments and some testing are still underway and are expected to complete within a week. The pages 
with the list of resulting tables are obviously not fitting within a 6-page conference paper, but will be 
made available in full upon request. Some of the fastest results have come back entirely, but for now they 
show disappointing accuracy performance of 20% correctly identified writiers in our settings, which is 
way below expected from our hypothesis. Since the results are incomplete, the authors are reviewing 
them as they come in and seek the faults in the implementation and data. The complete set of positive or 
negative outcomes will be summarized in the final version of the article. 

4.1 Applications 

We outline possible applications of our quick classification approach: 

• Students' exams verification in case of fraud claims to quickly sort out the pages into appropriate 
bins 

• For large amount of scanned checks (e.g. same as banks make available on-line). Personal checks 
identification can be used to see if the system can tell they are written by the same person. In this 
case the author used his personal check scans due to their handy availability. 

• Quick sorting out of hand-written mail. 

• Blackmail investigation when checking whether some letters with threats were written by the same 
person or who that person might be by taking sample handwriting samples of the suspects in 
custody or investigation. 

4.2 Future Work 

• Further improve recognition accuracy by investigating more algorithms and their properties. 

• Experiment with the CFE-based filters. 

• Automation of training process for the sorting purposes. 

• Export results in Forensic Lucid for forensic analysis. 

4.3 Improving Identification Accuracy 

So far, we did a quick way of doing writer authentication without using any common advanced or oth- 
erwise image processing techniques, such as contouring, skeletonizing, etc. and the related feature ex- 
traction, such as angles, lines, direction, relative position of them, etc. We can "inject" those approaches 
into the available pipeline if we can live with slower processing speeds due to the additional overhead 
induced by the algorithms, but improve the accuracy of the identification. 
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